Combining Visual and Acoustic Speech Signals with a Neural Network Improves Intelligibility
نویسندگان
چکیده
R.E. Jenkins The Applied Physics Laboratory The Johns Hopkins University Laurel, MD 20707 Acoustic speech recognition degrades in the presence of noise. Compensatory information is available from the visual speech signals around the speaker's mouth. Previous attempts at using these visual speech signals to improve automatic speech recognition systems have combined the acoustic and visual speech information at a symbolic level using heuristic rules. In this paper, we demonstrate an alternative approach to fusing the visual and acoustic speech information by training feedforward neural networks to map the visual signal onto the corresponding short-term spectral amplitude envelope (STSAE) of the acoustic signal. This information can be directly combined with the degraded acoustic STSAE. Significant improvements are demonstrated in vowel recognition from noise-degraded acoustic signals. These results are compared to the performance of humans, as well as other pattern matching and estimation algorithms.
منابع مشابه
Speech Intelligibility in Persian Children with Down Syndrome
Objectives: One of the most effective methods to describe speech disorders is the measurement of speech intelligibility. The speech intelligibility indicates the extent of acoustic signals that correctly speaker produces and hearer receives. The purpose of this study was to investigate the speech intelligibility in the Persian children with Down syndrome, age range was 3 to 5 years, who had spo...
متن کاملCombining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)
Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملCombining Neural Network with Genetic Algorithm for prediction of S4 Parameter using GPS measurement
The ionospheric plasma bubbles cause unpredictable changes in the ionospheric electron density. These variations in the ionospheric layer can cause a phenomenon known as the ionospheric scintillation. Ionospheric scintillation could affect the phase and amplitude of the radio signals traveling through this medium. This phenomenon occurs frequently around the magnetic equator and in low latitu...
متن کامل